We begin with raw court case data scraped from the General District Court Online Case Information System.
We apply a series of cleaning, standardization, and data aggregation procedures, including:
- Standardizing plaintiff and defendant names (e.g. “LLC” v. “L.L.C.”)
- Correcting common misspellings specific to these data (e.g., “Houseing” –> “Housing”)
- Extracting defendant ZIP Codes for each case
- Removing errant duplicate case records
- Identifying residential defendants
- Identifying corporate/government plaintiffs (i.e., non-human entities)
- Calculating total case costs
- Determining presence/absence of defense attorney
The summaries and visualization presented here include unlawful detainer cases filed between July 1, 2018 and March 31, 2021. Integration of cases from April through June is in progress.